machine learning and ai system
Data Representativity for Machine Learning and AI Systems
Data representativity is crucial when drawing inference from data through machine learning models. Scholars have increased focus on unraveling the bias and fairness in the models, also in relation to inherent biases in the input data. However, limited work exists on the representativity of samples (datasets) for appropriate inference in AI systems. This paper analyzes data representativity in scientific literature related to AI and sampling, and gives a brief overview of statistical sampling methodology from disciplines like sampling of physical materials, experimental design, survey analysis, and observational studies. Different notions of a 'representative sample' exist in past and present literature. In particular, the contrast between the notion of a representative sample in the sense of coverage of the input space, versus a representative sample as a miniature of the target population is of relevance when building AI systems. Using empirical demonstrations on US Census data, we demonstrate that the first is useful for providing equality and demographic parity, and is more robust to distribution shifts, whereas the latter notion is useful in situations where the purpose is to make historical inference or draw inference about the underlying population in general, or make better predictions for the majority in the underlying population. We propose a framework of questions for creating and documenting data, with data representativity in mind, as an addition to existing datasheets for datasets. Finally, we will also like to call for caution of implicit, in addition to explicit, use of a notion of data representativeness without specific clarification.
Could Artificial Intelligence REALLY Wipe Out Humanity? - AI Summary
Even staple figures in the field of science such as Stephen Hawking and Elon Musk have been vocal about technology's threat against humanity. The facts of the matter are that machines generally operate how they're programmed to and we are a long way from developing the ASI (artificial superintelligence) needed for this "takeover" to even be feasible. At present, most of the AI technology utilized by machines is considered "narrow" or "weak," meaning it can only apply its knowledge towards one or a few tasks. "Machine learning and AI systems are a long way from cracking the hard problem of consciousness and being able to generate their own goals contrary to their programming," George Montanez, a data scientist at Microsoft, wrote under the same Metafact thread. Some of these risks include overoptimization, weaponization, and ecological collapse, according to Ben Nye, the Director of Learning Sciences at the University of Southern California, Institute for Creative Technologies (USC-ICT).
How Machine Learning Is Supercharging Content Management - Liwaiwai
Machine learning and artificial intelligence (AI) are some of the hottest buzzwords around, especially in the open source community. It seems that every month brings a new machine learning system, each focused on a different application. The good news is that since academics developed many of these frameworks, they are often open source by default. Even Google's own neural network software library, TensorFlow, is (at least for now) open source. The bad news is that many of these frameworks are designed for high-end applications and require a lot of experience to deploy effectively.